perm filename ACM.1[TLK,DBL] blob sn#199368 filedate 1976-01-30 generic text, type T, neo UTF8
	INTRODUCTION (1 min)

I'm going to talk today about the  process of discovery and invention
in  empirical  science.   My thesis,  which  I'll describe  in  a few
minutes, is a  first step  toward automating theory  formation in  an
elementary subset of mathematics.

The first thing I want to  stress is the difference between solving a
given  problem  and thinking  it  up in  the first  place.   Contrast
playing monopoly, to  inventing new  games of the  same quality.   Or
contrast  solving the  Missionaries and  Cannibals problem,  with the
ill-defined task of formulating it.  If I give you the definition  of
prime numbers, no doubt you  can find some for me, but  how could you
have been motivated to propose that definition yourself?

	EXPLAINING THE DISCOVERY OF Primes (1 min)

This leads us  to wonder how we could explain a discovery, like prime
numbers.  Why did anybody decide they were worth investigating, worth
giving a name to?  Here's one plausible scenario for their discovery:
A  researcher has  just  discovered the  concepts of  "divisors-of" a
number.  He knows that very often it is worthwhile to examine extreme
cases,  so  he  poses  the  question  "Which  numbers  are  minimally
divisible, have as few divisors  as possible?" He formulates the  set
of numbers with precisely one divisor; this is just the singleton set
(1).   It's too small and already  well-understood.  Well, what about
numbers with precisely  2 divisors. That  turns out  to be the  prime
numbers.   Soon,  he  discovers unique  factorization  or some  other
important  relationship  involving these numbers, and decides they're
interesting and worth naming.

	PROBLEM-REDUCTION (2 min)

What have we really done here?  We reduced the  problem of explaining
the discovery of prime numbers, to the simpler task of explaining the
discovery of "divisors-of". The reduction was accomplished by  citing
a  heuristic, which  said  to investigate  which  items give  extreme
results under some function.  This is a pretty general heuristic.

We could  continue this process, reducing this  task to the discovery
of  multiplication, by  citing  a  heuristic  which  said  to  always
investigate the inverse of interesting relations. This process can be
continued, until  we've reduced our original task to the discovery of
concepts we consider to be primitive.

These reductions build up a chain  of concepts,  stretching from  the
given discovery  backwards, all the way to  known concepts. Each link
in the chain  is justified by  some more or  less general  heuristic.
The chain is a plausible scenario for discovering this concept.  When
we  hear  of a  new  discovery, we mentally try to create  this chain
ourselves; if we  succeed easily,  then we might  adopt the  attitude
that  the discovery  wasn't  so  hard, that  we  could  have done  it
ourselves.

	INVERTING THIS: TREE-GROWING (3.5 min) numerical model

Of  course, the process of invention  proceeds in this direction, not
this one.    If our  tree-searching  model is  at all  accurate,  the
researcher finds himself with a large base of existing knowledge, and
several  heuristic  operators  which   suggest  things  for  him   to
investigate.    Let's  do a  trivial  algorithmic  analysis  of  this
situation.  Say  there are h of these operators, and k concepts, then
the researcher might be staring at something like hk possible avenues
for research.   Notice how  vicious this combinatorial  explosion is.
There  are (h↑2)k concepts to consider after a two-step search. Since
investigating any  one of  these concepts might  take a  considerable
period of time, this process has a low success rate.

With this  same simple model, let's  look again at the  analysis of a
given discovery.  Well, there are  only h things that might have  led
to this, and h that led to that, so we are comparing h↑2 with (h↑2)k.

Typically, an  expert will  know thousands  of concepts (k=2500)  and
tens of  kinds of operators to suggest new  things to look at (h=20).
So discovering  some  concept  is  like searching  a  space  of  size
(20↑2)x2500=1,000,000. But analyzing  that discovery only  requires a
search of 20↑2 = 400 things.

So producing a discovery is quite a bit harder than rationalizing how
you might have done it, because in the analysis  process you at least
know  one  end-point  in  this  2-step  chain.    This  explains  why
discoveries sometimes seem so obvious  after the fact, even to  their
inventor, who may have labored years on the problem.

If we introduce the use of heuristic strategies in our model,   these
numbers shrink   dramatically. The total  number of  potential s-step
discoveries is (h↑s)k. Strategies have two effects on  this formula.
If a strategy    suggests which operations will be  the most fruitful
to  apply, from any given node, that  effectively reduces the size of
h.  If a strategy  tells you to focus in on specific nodes which seem
promising, (and to ignore  all the rest), that effectively reduces n.
A good researcher must  know -- and use  -- heuristic strategies 
which  produce both these effects.

	MY THESIS: AM (1 min)

I'm  saying that  theory  formation  can  be considered  a  heuristic
search:  That is,  a continual  expansion of a  tree of  concepts and
relationships, guided  by some  judgmental criteria,  some  heuristic
rules of thumb.

If this is true, one should be able to write a computer program which
proposes  new concepts in some scientific  domain.  After picking the
field in which the system will work, there are 3 things you'd have to
do: 
(1) specify  the initial core of  knowledge that it  will start with,
(2) specify  the set of operators which  indicate all the legal tasks
one might  do to  any  given  concept  to expand that knowledge,  and
(3) collect  the  heuristics  and  strategies  that  experts  in that
field  use  when  trying to  form theories.

This is  what  I've  tried to  do  for some  elementary  branches  of
mathematics. Since few  of you have any professional  interest in set
theory or number theory, I'll limit myself to a few general remarks.


    SLOSH TIME: 30 seconds **************************************


	ACTUAL IMPLEMENTATION: repr and control (2 min)

First  I  selected a  hundred very  fundamental  concepts to  let the
system begin with. There are a couple dozen different  facets to each
concept, but I supplied only  the definitions.  All the system can do
to a concept is  pick a blank facet  and try to fill  it in.  So  the
total number  of possible  activities that  the system  faces at  any
moment is about hk, where there are k concepts and each one has about
h facets.  The legal operators  can be seen to correspond to the  set
of facets.

I  used  introspection  to  construct   a  list  of  several  hundred
heuristics.  The final task was to decide on suitable representations
and algorithms.

Each  concept  has a  property  list  indicating  its  known  facets.
<slide: exs, defns, domain/range, generalizations,...>

We can  treat the relevant heuristics  as just another  facet of each
concept.  Each heuristic
or strategy will  be written as a  simple production rule.   When the
system  is  working on  some  particular activity,  all  the relevant
heuristics will  be thrown together  into one  production system  and
run.

What does the system actually do, now? It expands its knowledge, both
by   finding  out   more  about   the  already-known   concepts,  and
occasionally by  defining new  ones.   That  means it  fills out  the
properties  of existing  data  stuctures, and  sometimes  creates new
ones.

The flow  of  control  is  governed by  a  job-list,  a  sequence  of
suggested activities  to do, an  ordered list  of candidates for  the
system's  attention. Each candidate  job is proposed  by a heuristic.
It specifies that a  plausible thing to do is  to fill out a  certain
facet of  a certain concept.  The job is also  tagged with a  list of
definite  reasons for why  it might be  worth doing.   Based on those
reasons, the job is assigned a priority rating from 0 to 1000.

A very crucial  mechanism comes into  play if the newly-proposed  job
was already in the job-list.   In such cases, the list of reasons are
examined carefully.  If the job is being proposed for pretty much the
same  reasons as  before,  then its  priority  rating is  incremented
slightly.   But if it  is being proposed  for a new  reason, then its
priority is increased tremendously.

The basic flow of control is quite simple. The system looks  over the
job-list, and  tries to  execute the job  with the  highest priority.
Its  priority rating  indicates the  number of milliseconds  to spend
before  giving up  and  trying  the next  one.    In the  process  of
performing the job, new  jobs may be suggested, new concepts created,
blank facets filled in, etc.

Well, in any event, that was how AM was designed and created. You now
know all the essentials of its representations and algorithms.


	PERFORMANCE and problems (1.5 min)

The system started with 100 prenumerical  concepts (k=100), and about
half  of the 100 new  concepts it proposed were  interesting. Many of
them were  several levels  deep (s=7),  for example  the notions  of
numbers,    multiplication,   divisors,    prime   numbers,    unique
factorization.   The heuristics guided the system to only look at 200
nodes in a space  of size (h↑7)k [write: (25↑7) x 100] which is about
6 billion. This whole development  only used  an hour or  so   of cpu
time.   The  system is written in INTERLISP.

The result  I'm proudest  of  is the  discovery  of an  entirely  new
theory: a new  way to characterize  numbers which have  an abnormally
large number of divisors.

The major  problem is providing heuristics  which are powerful enough
to  recognize quickly  whether  each  new  concept  is  going  to  be
interesting or a loser.

Of  course,  the  standard problem  in our business is the tremendous
gap  between plausible English prose that  will convince a human, and
debugged code that  will convince a computer.   When talking to  you,
it's fine for me to  say that a function is interesting if its domain
and range sets coincide; but for my system, I have to specify exactly
how  to calculate  that interest  value.   The  precision  I need  to
instruct the  machine still catches me unaware  sometimes, and I find
myself underestimating the time I'll need to program some idea  I was
sure would be a cinch.

This problem  can be summed  up as "Just  because you can  talk about
something  doesn't mean you understand it".   There is no solution to
this. It is one reason that we actually write these AI programs. They
force us  to make our ideas  precise.  This implies that  we can talk
about more than we really understand.  Which suggests to me that it's
about time for me to stop talking.


	QUESTIONS (2 min)